Enforcing Relational Matching Dependencies with Datalog for Entity Resolution
نویسندگان
چکیده
Entity resolution (ER) is about identifying and merging records in a database that represent the same real-world entity. Matching dependencies (MDs) have been introduced and investigated as declarative rules that specify ER policies. An ER process induced by MDs over a dirty instance leads to multiple clean instances, in general. General answer sets programs have been proposed to specify the MD-based cleaning task and its results. In this work, we extend MDs to relational MDs, which capture more application semantics, and identify classes of relational MDs for which the general ASP can be automatically rewritten into a stratified Datalog program, with the single clean instance as its standard model.
منابع مشابه
ERBlox: Combining Matching Dependencies with Machine Learning for Entity Resolution
Entity resolution (ER), an important and common data cleaning problem, is about detecting data duplicate representations for the same external entities, and merging them into single representations. Relatively recently, declarative rules called matching dependencies (MDs) have been proposed for specifying similarity conditions under which attribute values in database records are merged. In this...
متن کاملMASTER OF SCIENCE Computational Mathematics and Modern Information Technologies
Entity-Relationship Data Model: Data structuring, Entity-Relationship Diagrams, Equivalence of EntityRelationship and the Functional Modeling, Algorithms for translating Entity-Relationship Diagrams into Relational and Elementary Mathematical Data Models. Relational Data Model: The structure of the Relational Data Model, Relational Algebra, Relational Calculus, Relational Query Languages, Stati...
متن کاملQuery Rewriting Using Datalog for Duplicate Resolution
Matching Dependencies (MDs) are a recent proposal for declarative entity resolution. They are rules that specify, given the similarities satisfied by values in a database, what values should be considered duplicates, and have to be matched. On the basis of a chase-like procedure for MD enforcement, we can obtain clean (duplicate-free) instances; actually possibly several of them. The clean answ...
متن کاملA Rule-Based Approach to Analyzing Database Schema Objects with Datalog
Database schema elements such as tables, views, triggers and functions are typically defined with many interrelationships. In order to support database users in understanding a given schema, a rule-based approach for analyzing the respective dependencies is proposed using Datalog expressions. We show that many interesting properties of schema elements can be systematically determined this way. ...
متن کاملString-Oriented Databases
Relational databases and Datalog view each attribute as indivisible. This view, though useful in several applications, does not provide a suitable database paradigm for use in genetic, multi-media or scientific databases. Data in these applications are unstructured; querying on sub-strings of attributevalues is often necessary. Moreover, due to imprecision and incompleteness in the data, approx...
متن کامل